Improving the performance of minimizers and winnowing schemes
Identifieur interne : 000D39 ( Main/Exploration ); précédent : 000D38; suivant : 000D40Improving the performance of minimizers and winnowing schemes
Auteurs : Guillaume Marçais [États-Unis] ; David Pellow [Israël] ; Daniel Bork [États-Unis] ; Yaron Orenstein [États-Unis] ; Ron Shamir [Israël] ; Carl Kingsford [États-Unis]Source :
- Bioinformatics [ 1367-4803 ] ; 2017.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- methods : Genomics, Sequence Analysis, DNA.
- Algorithms, Genome, Human, Humans, Software.
Abstract
The minimizers scheme is a method for selecting
We provide an in-depth analysis of the effect of
The software used for this analysis is available on GitHub:
Url:
DOI: 10.1093/bioinformatics/btx235
PubMed: 28881970
PubMed Central: 5870760
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000B19
- to stream Pmc, to step Curation: 000B19
- to stream Pmc, to step Checkpoint: 000822
- to stream PubMed, to step Corpus: 000B64
- to stream PubMed, to step Curation: 000B64
- to stream PubMed, to step Checkpoint: 000C61
- to stream Ncbi, to step Merge: 001B68
- to stream Ncbi, to step Curation: 001B68
- to stream Ncbi, to step Checkpoint: 001B68
- to stream Main, to step Merge: 000D42
- to stream Main, to step Curation: 000D39
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving the performance of minimizers and winnowing schemes</title>
<author><name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation wicri:level="4"><nlm:aff id="btx235-aff1">Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation wicri:level="1"><nlm:aff id="btx235-aff2">Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv</wicri:regionArea>
<wicri:noRegion>Tel-Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Bork, Daniel" sort="Bork, Daniel" uniqKey="Bork D" first="Daniel" last="Bork">Daniel Bork</name>
<affiliation wicri:level="4"><nlm:aff id="btx235-aff1">Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
<affiliation wicri:level="2"><nlm:aff id="btx235-aff3">Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation wicri:level="1"><nlm:aff id="btx235-aff2">Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv</wicri:regionArea>
<wicri:noRegion>Tel-Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="4"><nlm:aff id="btx235-aff1">Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">28881970</idno>
<idno type="pmc">5870760</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870760</idno>
<idno type="RBID">PMC:5870760</idno>
<idno type="doi">10.1093/bioinformatics/btx235</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000B19</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B19</idno>
<idno type="wicri:Area/Pmc/Curation">000B19</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B19</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000822</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000822</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:28881970</idno>
<idno type="wicri:Area/PubMed/Corpus">000B64</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000B64</idno>
<idno type="wicri:Area/PubMed/Curation">000B64</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000B64</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000C61</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000C61</idno>
<idno type="wicri:Area/Ncbi/Merge">001B68</idno>
<idno type="wicri:Area/Ncbi/Curation">001B68</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001B68</idno>
<idno type="wicri:doubleKey">1367-4803:2017:Marcais G:improving:the:performance</idno>
<idno type="wicri:Area/Main/Merge">000D42</idno>
<idno type="wicri:Area/Main/Curation">000D39</idno>
<idno type="wicri:Area/Main/Exploration">000D39</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Improving the performance of minimizers and winnowing schemes</title>
<author><name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation wicri:level="4"><nlm:aff id="btx235-aff1">Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation wicri:level="1"><nlm:aff id="btx235-aff2">Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv</wicri:regionArea>
<wicri:noRegion>Tel-Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Bork, Daniel" sort="Bork, Daniel" uniqKey="Bork D" first="Daniel" last="Bork">Daniel Bork</name>
<affiliation wicri:level="4"><nlm:aff id="btx235-aff1">Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
<affiliation wicri:level="2"><nlm:aff id="btx235-aff3">Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation wicri:level="1"><nlm:aff id="btx235-aff2">Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv</wicri:regionArea>
<wicri:noRegion>Tel-Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="4"><nlm:aff id="btx235-aff1">Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Genome, Human</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Génome humain</term>
<term>Génomique ()</term>
<term>Humains</term>
<term>Logiciel</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Genomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génome humain</term>
<term>Génomique</term>
<term>Humains</term>
<term>Logiciel</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><title>Abstract</title>
<sec id="SA1"><title>Motivation</title>
<p>The minimizers scheme is a method for selecting <italic>k</italic>
-mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many <italic>k</italic>
-mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of <italic>k</italic>
-mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues.</p>
</sec>
<sec id="SA2"><title>Results</title>
<p>We provide an in-depth analysis of the effect of <italic>k</italic>
-mer ordering on the performance of the minimizers technique. By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors. Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order. This analysis also settles negatively a conjecture (by Schleimer <italic>et al.</italic>
) on the expected density of minimizers in a random sequence.</p>
</sec>
<sec id="SA4"><title>Availability and Implementation</title>
<p>The software used for this analysis is available on GitHub: <ext-link ext-link-type="uri" xlink:href="https://github.com/gmarcais/minimizers.git">https://github.com/gmarcais/minimizers.git</ext-link>
.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="De Bruijn, N G" uniqKey="De Bruijn N">N.G. de Bruijn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S. Deorowicz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grabowski, S" uniqKey="Grabowski S">S. Grabowski</name>
</author>
<author><name sortKey="Raniszewski, M" uniqKey="Raniszewski M">M. Raniszewski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, Y" uniqKey="Li Y">Y. Li</name>
</author>
<author><name sortKey="Yan, X" uniqKey="Yan X">X. Yan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y. Orenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y. Orenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Roberts, M" uniqKey="Roberts M">M. Roberts</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Roberts, M" uniqKey="Roberts M">M. Roberts</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schleimer, S" uniqKey="Schleimer S">S. Schleimer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wood, D E" uniqKey="Wood D">D.E. Wood</name>
</author>
<author><name sortKey="Salzberg, S L" uniqKey="Salzberg S">S.L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ye, C" uniqKey="Ye C">C. Ye</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>Israël</li>
<li>États-Unis</li>
</country>
<region><li>Massachusetts</li>
<li>Pennsylvanie</li>
</region>
<settlement><li>Pittsburgh</li>
</settlement>
<orgName><li>Université Carnegie-Mellon</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="Pennsylvanie"><name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
</region>
<name sortKey="Bork, Daniel" sort="Bork, Daniel" uniqKey="Bork D" first="Daniel" last="Bork">Daniel Bork</name>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
</country>
<country name="Israël"><noRegion><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
</noRegion>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D39 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D39 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:5870760 |texte= Improving the performance of minimizers and winnowing schemes }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:28881970" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |